-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(algorithm): support biased second order random walk #280
feat(algorithm): support biased second order random walk #280
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #280 +/- ##
============================================
- Coverage 85.03% 84.99% -0.04%
- Complexity 3246 3291 +45
============================================
Files 345 349 +4
Lines 12298 12472 +174
Branches 1102 1129 +27
============================================
+ Hits 10458 10601 +143
- Misses 1315 1328 +13
- Partials 525 543 +18 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some tiny comments
} | ||
|
||
// weight threshold truncation | ||
if ((Double) weight.value() < this.minWeightThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call weight.doubleValue() here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Property value may not be a numeric value.
} | ||
|
||
/** | ||
* get edge weight by weight property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Get the weight of a edge by its weight property"
/** | ||
* get edge weight by weight property | ||
*/ | ||
private Value getWeight(Edge edge) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefer getEdgeWeight()
} | ||
LOG.info("[RandomWalk] algorithm param, {}: {}", OPTION_WALK_LENGTH, walkLength); | ||
LOG.info("[RandomWalk] algorithm param, {}: {}", OPTION_WALK_LENGTH, this.walkLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add a common method like logAlgorithmParam(name, value)
, and just use the this.name()
as logged algorithm name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, I feel that the logs of algorithm param here should be placed in the framework. I'll remove thoes logs.
Thank you for your guidance. |
} | ||
|
||
// weight threshold truncation | ||
if ((Double) weight.value() < this.minWeightThreshold) { | ||
DoubleValue weight = (DoubleValue) property; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on a double value like:
- double weight = this.defaultWeight;
- weight = property..doubleValue() if checked ok
- do truncation
- return weight
private Double calculateWeight(Id preVertexId, IdList preVertexAdjacenceIdList, | ||
Id nextVertexId, Value weight) { | ||
private Double calculateEdgeWeight(Id preVertexId, IdList preVertexAdjacenceIdList, | ||
Id nextVertexId, double weight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also keep double finalWeight
and return double
?
double weight = this.getEdgeWeight(edge); | ||
Double finalWeight = this.calculateEdgeWeight(preVertexId, preVertexAdjacenceIdList, | ||
edge.targetId(), weight); | ||
weightList.add(finalWeight); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a mark here: TODO: improve to avoid OOM
Edge selectedEdge = null; | ||
int randomNum = random.nextInt(edges.size()); | ||
private Edge randomSelectEdge(Id preVertexId, IdList preVertexAdjacenceIdList, Edges edges) { | ||
List<Double> weightList = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add a mark here: TODO: use primitive array instead, like DoubleArray, in order to reduce memory fragmentation generated during calculations
the same to https://github.com/search?q=repo%3Aapache%2Fincubator-hugegraph-computer+path%3A%2F%5Ecomputer-algorithm%5C%2F%2F++new+ArrayList&type=code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean double[]
? I'll submit a new issue for this and try to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but we can use third-party libraries such as eclipse collections: https://eclipse.dev/collections/javadoc/11.1.0/org/eclipse/collections/api/list/primitive/DoubleList.html
some libs of primitive collections:
- eclipse-collections: https://github.com/eclipse/eclipse-collections
- hppc: https://github.com/carrotsearch/hppc
- fastutil: https://github.com/vigna/fastutil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome
Edge selectedEdge = null; | ||
int randomNum = random.nextInt(edges.size()); | ||
private Edge randomSelectEdge(Id preVertexId, IdList preVertexAdjacenceIdList, Edges edges) { | ||
List<Double> weightList = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but we can use third-party libraries such as eclipse collections: https://eclipse.dev/collections/javadoc/11.1.0/org/eclipse/collections/api/list/primitive/DoubleList.html
some libs of primitive collections:
- eclipse-collections: https://github.com/eclipse/eclipse-collections
- hppc: https://github.com/carrotsearch/hppc
- fastutil: https://github.com/vigna/fastutil
@@ -128,6 +128,8 @@ private static List<Serializable> expressions( | |||
if (filter.size() == 0) { | |||
return PASS; | |||
} | |||
// TODO: use primitive array instead, like DoubleArray, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need to mark lists about algorithms.
} | ||
|
||
@Override | ||
protected List<String> value(Vertex vertex) { | ||
IdListList value = vertex.value(); | ||
// TODO: use primitive array instead, like DoubleArray, | ||
// in order to reduce memory fragmentation generated during calculations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List<String>
doesn't require replacement(because there is no string[]
type), and the propValues doesn't look like very large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THX~
Thank you for your guidance. |
Purpose of the PR
Main Changes
The current random walk algorithm requires 2 additional features.
Add the following parameters:
String weightProperty
. To implement biased random walk. The higher the weight, the higher the probability of walking.Double defaultWeight
. Provide a default value if the weight is null.Double minWeightThreshold
. Truncate when weight is less than the threshold to avoid too small weight.Double maxWeightThreshold
. Truncate when weight exceeds the threshold to avoid overweighting.Double returnFactor
. Controls the probability of re-walk to a previously walked vertex.Double inOutFactor
. Controls whether to walk inward or outward.For more details about
returnFactor
andinOutFactor
, please refer to the paper《node2vec: scalable feature learning for networks》.Verifying these changes
Unit test:
org.apache.hugegraph.computer.algorithm.sampling.RandomWalkTest
Does this PR potentially affect the following parts?
Documentation Status
Doc - TODO
Doc - Done
Doc - No Need